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Abstract 


Recently  the  Multi-Level  algorithm  was  introduced  as  a  general  purpose 
solver  for  the  solution  of  steady  state  Markov  chains.  In  this  paper  we  con¬ 
sider  the  performance  of  the  Multi-Level  algorithm  for  solving  Nearly  Com¬ 
pletely  Decomposable  (NCD)  Markov  chains,  for  which  special-purpose  iter¬ 
ative  aggregation/disaggregation  algorithms  such  as  the  Koury-McAllister- 
Stewart  (KMS)  method  have  been  developed  that  can  exploit  the  decompos- 
ability  of  the  the  Markov  chain.  We  present  experimental  results  indicating 
that  the  general-purpose  Multi-Level  algorithm  is  competitive,  and  can  be 
significantly  faster  than  the  special-purpose  KMS  algorithm  when  Gauss- 
Seidel  and  Gaussian  Elimination  are  used  for  solving  the  individual  blocks. 


Availability  Ccn<er 

Avaii  j  .. .  c.'. 
o>ai 


‘This  research  was  supported  by  the  National  Aeronautics  and  Space  Administration  under  NASA  contract  NAS1- 
19480  while  the  second  author  was  in  residence  at  the  Institute  for  Computer  Applications  in  Science  and  Engineering, 
NASA  Langley  Research  Center. 

’This  work  was  carried  out  in  part  while  the  second  author  was  a  guest  at  ICASE,  NASA  Langley  Research  Center. 


,DTIC  QUALITY  INSPECTED  3 


□  □ 


1  Introduction 


Recently  the  Multi-Level  (ML)  algorithm  was  introduced  as  a  fast  solver  for  general  steady-state  Markov 
chains  [2].  The  algorithm  has  been  shown  to  be  faster,  often  one  or  two  orders  of  magnitude  faster 
relative  to  the  Gauss-Seidel  and  optimized  SOR  algorithms  for  a  number  of  different  Markov  chains. 
The  study  showed  the  potential  of  the  algorithm,  but  in  the  light  of  the  small  number  of  test  problems 
represented  only  a  first  step  in  determining  whether  the  excellent  performance  can  be  realized  for 
all  Markov  chains.  In  this  work  we  investigate  the  utility  of  the  ML  algorithm  for  solving  Nearly 
Completely  Decomposable  (NCD)  Markov  chains.  For  this  class  of  Markov  chains  special-purpose 
iterative  aggrega* ion/disaggregation  (IAD)  algorithms  [3,  11]  have  been  shown  to  perform  well.  Thus, 
demonstration  of  the  utility  of  an  algorithm  for  solving  NCD  chains  necessitates  comparison  with  one  of 
these  IAD  schemes.  We  include  the  Koury-McAllister-Stewart  (KMS)  [3]  algorithm  as  a  representative 
example.  We  also  include  the  Gauss-Seidel  (GS)  method  for  purposes  of  comparison. 

In  our  experiments  we  structure  the  Markov  chains  in  NCD  “normal”,  i.e.  almost  block-diagonal 
form.  Hence  we  give  an  advantage  to  KMS,  since  in  practice  the  algorithm  would  require  computation 
time  for  the  restructuring  of  the  Markov  chain.  However,  Stewart  and  Wu  have  shown  that  this  re¬ 
structuring  time  often  is  not  significant  [9].  Like  Stewart  and  Wu,  we  use  Gaussian  elimination  to  solve 
blocks  of  size  less  than  100  states. 

One  difference  between  our  experiments  and  those  of  Stewart  and  Wu  is  that  they  use  SOR  as  well 
as  GS  to  solve  blocks.  Note  that  we  have  found  using  SOR  with  a  well  chosen  relaxation  parameter  for 
solving  the  blocks  can  decrease  the  computation  time  of  the  KMS  algorithm  by  a  factor  of  3  or  more  for 
many  of  the  chains  we  consider,  but  determining  the  correct  parameter  is  not  a  trivial  task.  Another 
difference  is  that  when  Stewart  and  Wu  use  SOR  or  GS  to  solve  a  block,  they  perform  a  fixed  number 
of  iterations  on  the  block  whereas  we  perform  iterations  until  the  solution  does  not  improve  by  more 
than  c  in  any  state,  for  some  c. 

The  rest  of  the  paper  is  organized  as  follows.  In  section  2  we  provide  some  background  material  on 
NCD  chains.  In  sections  3  and  4  we  briefly  describe  the  ML  and  KMS  algorithms.  Section  5  contains 
our  experimental  results.  The  final  section  contains  our  conclusions. 


2  Nearly  Completely  Decomposable  Markov  Chains 


In  this  section  we  give  a  brief  informal  description  of  NCD  chains  and  give  two  examples  that  we  use 
in  our  experimental  studies.  The  seminal  work  on  NCD  systems  was  conducted  by  Simon  and  Ando 
[6],  and  applied  to  Markov  chains  by  Courtois  [1].  For  a  more  detailed  treatment  we  refer  the  reader  to 
Stewart  [9,  10].  An  NCD  Markov  chain  with  B  blocks  can  be  ordered  so  that  the  generator  matrix  can 
be  written  in  block  form  as: 


Q  n 

Q 12  • 

•  Q\b 

Q  21 
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Figure  1:  Sample  Nearly  Completely  Decomposable  Birth-Death  Markov  Chain 

where  the  elements  of  the  off-diagonal  blocks  Qij,  i  /  j  are  small  compared  to  the  elements  of  the 
diagonal  blocks  Qa.  The  larger  the  elements  of  the  off-diagonal  blocks,  the  less  NCD  the  chain  becomes. 
Note  that  for  a  completely  decomposable  chain  the  off-diagonal  blocks  would  contain  only  zeros  and 
the  Markov  chain  degenerates  into  a  set  of  independent  problems.  Any  solution  algorithm  that  makes 
use  of  the  block  structure  will  have  to  invest  some  pre-processing  effort  into  permuting  the  matrix  into 
the  above  almost  block-diagonal  form. 

In  this  paper  we  consider  two  different  example  NCD  Markov  chain  structures. 


2.1  Example  1 

The  first  example  is  a  modified  birth-death  chain  that  is  subdivided  into  B  blocks  of  size  N.  The  birth 
rate  between  blocks  is  £x  the  birth  rate  within  each  block,  and  the  death  rate  between  blocks  is  6x 
the  internal  death  rate.  The  parameter  6  thus  controls  the  degree  of  decomposability  of  the  chain.  In 
figure  1  we  provide  an  example  with  2  blocks,  three  states  per  block,  a  birth  (death)  rate  of  1  (2),  and 
6  —  0.001.  In  our  experiments  we  will  vary  both  the  number  of  blocks  for  a  given  chain  size  and  the 
inter-block  coupling  strength  6. 

2.2  Example  2 

The  second  example  NCD  Markov  chain  structure,  shown  in  figure  2,  is  the  interactive  computer 
system  model  used  by  Stewart  [4,  8,  9].  In  this  model  a  CPU  processes  jobs  arriving  from  a  set  of 
terminals,  which  may  subsequently  require  service  from  a  filing  device  (FD)  or  secondary  memory 
(SM).  Let  Lx  equal  the  length  at  queue  “ x ”.  Let  17  equal  the  number  of  jobs  in  the  system,  giving 
t)  =  Lcpu  +  Lfd  +  Lsm-  We  assume  there  is  one  job  per  terminal,  and  that  the  terminals  are  modeled 
as  an  infinite  server.  The  Markov  chain  is  decomposed  into  B  +  1  blocks,  where  B  is  the  number  of 
terminals  and  block  i  is  the  set  of  all  states  which  have  i  jobs  in  the  system.  These  blocks  form  an  NCD 
chain  when  the  terminal  service  rate  and  the  probability  of  returning  to  the  terminals  after  visiting  the 
CPU  are  small. 

We  assume  the  same  parameter  values  as  are  used  in  [9]:  the  rate  leaving  the  terminals  is  70^55, 
the  rate  leaving  the  SM  device  is  0.2,  the  rate  leaving  the  FD  is  the  rate  from  the  CPU  to  the  FD  is 
0.05,  the  rate  from  the  CPU  to  the  terminals  is  0.002,  and  the  rate  from  the  CPU  to  the  SM  device  is 
100(^g)15.  This  last  rate  is  meant  to  model  the  rate  at  which  page  faults  occurs  and  increases  with  the 
number  of  currently  executing  jobs.  For  a  more  detailed  description  of  the  model  we  refer  the  reader  to 
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Figure  2:  Multiprogrammed  Queueing  Network  Model 


[9,  4]. 


3  Multi-Level  Algorithm 


In  this  section  we  briefly  review  the  recently  introduced  multi-level  algorithm,  details  of  which  can  be 
found  in  [2].  Consider  a  Markov  chain  consisting  of  n  states  so . . .  s„_ j.  Denote  the  unknown  vector  by 
p,  where  p,  is  the  probability  of  finding  the  Markov  chain  in  state  s,-. 

We  then  have  to  solve  the  system  of  equations 

PP  =  0  (2) 


with  the  additional  condition 

*==n— 1 

£  Pi  =  1  (3) 

t=0 

This  equation  is  usually  written  as  ttQ  =  0  for  ir  —  pT  and  Q  =  PT ,  the  infinitesimal  generator  matrix. 

A  coarser  representation  of  the  Markov  chain  described  by  matrix  P  may  be  obtained  by  aggre¬ 
gation.  This  means  creating  a  new  Markov  chain  described  by  a  matrix  Q  with  the  vector  of  state 
probabilities  q,  each  of  whose  N  states  Sq.  . .  is  derived  from  a  number  of  states  of  the  original 
system. 

In  the  following  we  will  use  the  terms  fine  level  and  coarse  level  to  refer  to  Markov  chains  where 
the  latter  is  obtained  by  aggregation  from  the  former.  The  relation  s/t  6  5,  signifies  that  the  fine  level 
state  s/c  is  mapped  by  the  aggregation  operation  to  the  coarse  level  state  5,. 

The  matrix  Q  of  the  aggregated  system  is  chosen  as  follows  : 


Qa  = 


E  Pk  E  Plk 

si£Sj 

TT7k 

•k£S, 


(4) 
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procedure  ml(l) 
if  (/  =  0) 

solve  Pipi  =  0 
else 

pi  -  GS1'  (pi) 

Pi- 1  *  Ri-i,i(P‘) 

ml(/  —  1) 

Pj-i  *  Pl-i/Pl-i 
Pj  *  h-u(P*-i) 
Pi  =  C{pi,pi) 
return 


Figure  3:  Multi-Level  Algorithm 

This  is  the  well-known  aggregation  matrix.  Note  that  the  matrix  Q  is  a  function  not  only  of  the  fine 
level  matrix  P,  but  also  of  the  fine  level  solution  vector  p. 

This  yields  the  aggregated  equations  in  the  unknown  q : 

Qq  =  0 

N- 1 

5>  =  !  • 

«=o 

It  can  then  be  shown  that 

=  5Z  Pk  ,  (5) 

i.e.  the  solution  q  of  the  aggregated  system  truly  represents  a  coarser  version  of  the  solution  p  of  the 
original  problem.  The  probability  of  being  in  state  qi  is  the  sum  of  the  probabilities  of  being  in  any  of 
its  constituent  fine-level  states.  We  use  the  aggregation  equation  as  a  basis  for  the  multi-level  method, 
whereby  we  approximate  the  exact  solution  values  pk  in  (4)  above  by  values  from  the  current  iterate. 

The  ML  algorithm  can  be  viewed  as  a  recursive  aggregation/disaggregation  algorithm.  We  adopt 
the  following  abbreviations  for  vectors  a,  b,  c  £  IFF1 : 

a  =  b*c  =  a,  =bi*Ci,  1  <i<m 
a  =  b/c  =  a,  =  bi/c{,  l  <  i  <  m 

The  two-level  version  of  the  ML  iteration  is  given  by  the  following  sequence  of  steps. 

•  Perform  Gauss-Seidel  relaxation  on  current  iterate  p('\  yielding  p 

p  =  GS(pU)  (6) 

•  Restrict  current  solution  p  to  coarse-level  vector  q 

q  =  R(p)  =  qi=  pk  (7) 

Sk€S, 
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•  Compute  coarse  matrix  Q 


Qji  = 


£  Pk  £  Pik 

it 


•  Solve  coarse  equation  for  q 


•  Compute  coarse  level  correction  q * 


•  Compute  fine  level  correction  p* 


N-l 

Qq  =  o  ,  g,  =  1 

t'=0 


q*  =  q/q 


p*  ~  i(qm)  =  Pi  =  q* 

•  Apply  fine  level  correction  to  obtain  new  iterate  p(,+1) 

p(*+D=p  =  C(p,p*)  =  p  +  p* 


(8) 

(9) 

(10) 


(11) 


(12) 


In  this  two-level  form  the  method  is  similar  to  well-known  iterative  aggregation/disaggregation 
(LAD)  methods  such  as  those  of  Koury,  McAllister  and  Stewart  [3]  and  of  Takahashi  [11].  The  multi¬ 
level  algorithm  is  obtained  by  recursive  application  of  the  two-level  algorithm  to  obtain  a  solution  to 
the  aggregated  equation  (9)  and  is  described  in  algorithmic  form  in  figure  3.  We  use  the  subscript  l 
to  denote  level  of  representation  (/  =  Imax  finest  level,  i.e.  the  original  Markov  chain,  l  =  0  coarsest 
level).  The  coarse  level  /  —I  and  fine  level  l  between  which  the  operators  I  and  R  map  are  identified  by 
appropriate  indices.  Note  that,  because  of  the  recursive  nature  of  the  algorithm,  the  unknowns  q*,  q  and 
q  are  represented  by  the  variables  p*_j,  pi- 1  and  pi- j,  respectively.  We  allow  in  general  the  possibility 
of  applying  GS  v  times  at  each  level  with  v  >  1,  denoted  by  GS". 

The  performance  of  the  policy  can  be  greatly  influenced  by  the  aggregation  strategy  used.  The 
aggregation  strategy  used  for  all  experiments  in  this  paper  attempts  to  aggregate  pairs  (or  triples)  of 
fine  level  states  that  are  strongly  coupled.  Let  $  be  the  maximum  number  of  fine  level  states  allowed 
to  be  aggregated  into  a  single  coarse  level  state.  For  all  experiments  in  this  paper  ¥  is  set  to  three. 
Let  a  be  the  aggregation  rate  differential,  used  for  determining  whether  two  states  are  coupled  strongly 
enough  to  aggregate  them  together.  For  all  experiments  in  this  paper  we  set  a  =  0.5.  We  loop  though 
all  states  of  a  given  level  and  for  each  state  Sj  that  has  not  yet  been  assigned  to  an  aggregated  state: 

1.  Let  absolutemax  =  the  maximum  of  {Pij,Pji}  Vj. 

2.  Let  availablemax  =  the  maximum  of  {Pij,  Pji }  Vj  such  that  Sj  has  not  yet  been  assigned  to  an 
aggregated  state  whose  number  of  constituent  fine  level  states  is  equal  to 

3.  If  availablemax  >  (a  *  absolutemax),  then  aggregate  s,  with  sr  Note,  it  is  possible  that  sj  is 
already  in  an  aggregate  of  size  ^  —  1  in  which  case  Sj  is  added  to  the  aggregate  to  make  it  of  size 

9. 
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4.  If  availablemax  <  (a  *  absolutemax ),  then  map  Sj  to  a  new  coarse  level  state  by  itself.  Another 
state  visited  latter  in  the  aggregation  step  may  be  aggregated  with  this  state,  if  not  the  coarse 
state  remains  composed  of  only  this  one  fine  level  state. 

The  goal  of  this  aggregation  method  is  to  avoid  coupling  weakly  connected  states  when  in  the 
presence  of  a  strong  connection  to  some  other  state,  and  also  to  avoid  the  creation  of  large  aggregates. 
The  parameter  a  ensures  that  states  are  strongly  connected  enough  before  aggregating  them  together, 
and  the  use  of  singletons  avoids  creation  of  large  aggregates.  This  aggregation  strategy  takes  strongly 
differing  rates  into  account  and  therefore  will  aggregate  within  blocks  when  the  chain  is  NCD.  As  a  result, 
strongly  coupled  states  are  aggregated  together  and  at  some  intermediate  level  the  ML  aggregation  will 
likely  correspond  to  the  KMS  coarse  level. 

4  KMS  Algorithm 

Iterative  Aggregation/Disaggregation  (LAD)  methods  are  a  well-known  class  of  algorithms  for  the  steady 
state  solution  of  Markov  chains  and  which  bear  a  close  relationship  to  the  ML  method  presented  here. 
LAD  methods  are  reviewed  in  [7].  The  generic  IAD  method  is  that  of  Koury,  McAllister  and  Stewart 
(KMS)  [3].  Using  the  notation  of  section  3,  the  KMS  method  is  defined  as  follows: 

1.  Construct  coarse  level  matrix  Q  using  (8) 

2.  Solve  coarse  system  (9). 

3.  Perform  the  correction  (11),  (12). 

4.  For  each  coarse  level  state  Si  solve  the  set  of  equations  connecting  all  fine  level  states  s*  aggregated 
to  it  (sjt  €  5j). 

5.  If  not  converged  goto  1 

Step  (4)  is  a  Block  Gauss-Seidel  step  on  the  finer  level,  where  the  blocks  are  defined  by  the  aggregated 
system  corresponding  to  (1). 

We  now  see  that  the  KMS  algorithm  is  a  special  case  of  ML,  obtained  by  the  following  choices: 

1.  Use  of  only  two  levels  of  representation  of  the  system,  rather  than  recursively  coarsened  problems. 

2.  The  number  of  fine  unknowns  aggregated  into  a  single  state  is  large,  whereas  for  ML  the  number 
of  unknowns  is  between  one  and  three,  usually  two  or  three. 

3.  Use  of  Block  Gauss-Seidel  on  the  finer  level,  as  opposed  to  a  pointwise  Gauss-Seidel  scheme. 

One  open  question  regarding  the  KMS  algorithm  is  how  much  work  per  global  iteration  should  be 
done  solving  each  block  on  the  finer  level.  In  Stewart  [10,  9)  SOR  was  applied  for  a  fixed  number  of 
iterations.  We  chose  instead  to  solve  with  GS  until  the  solution  does  not  improve  by  more  than  t  in 
any  state,  for  (  values  of  le-06,  le-09,  and  le-11. 


Gaussian  elimination  for  the  solution  of  the  blocks  is  generally  faster  than  solution  by  Gauss-Seidel 
iteration  when  implemented  in  a  way  that  preserves  the  sparsity  of  the  matrices  involved  and  these  are 
of  suitably  small  dimension.  For  larger  blocks  both  the  memory  and  cpu  requirements  force  the  switch 
to  the  iterative  scheme.  On  the  other  hand,  when  the  chain  is  NCD  the  KMS  algorithm  usually  obtains 
a  fairly  accurate  solution  after  the  first  iteration,  meaning  that  Gauss-Seidel  already  has  a  good  initial 
guess  and  requires  only  a  few  iterations  to  convergence.  In  this  case  iteration  is  to  be  preferred  over 
elimination.  As  done  in  [9],  we  use  Gaussian  elimination  for  all  blocks  comprised  of  fewer  than  100 
states. 

IAD  methods  such  as  this  suffer  the  drawback  that  they  are  only  applicable  to  NCD  Markov  chains. 
The  aggregation  is  then  defined  by  assigning  one  coarse  unknown  to  each  block.  The  number  of  fine 
unknowns  that  are  aggregated  to  a  single  coarse  state  may  be  quite  large. 

5  Experimental  Results 

In  this  section  we  describe  the  experiments  and  the  results  obtained.  In  all  figures,  the  lines  are  plotted 
in  the  same  order  they  appear  in  the  legends.  We  first  present  results  from  the  generalized  birth  death 
structure  described  in  section  2.1.  This  trivial  example  allows  us  to  investigate  how  changing  the  number 
of  states  per  block  and  decreasing  the  decomposability  affect  the  relative  performance  of  the  policies. 
Note,  in  practice  this  chain  would  be  solved  analytically,  we  only  solve  it  numerically  to  explore  the 
relative  solution  speed  of  the  algorithms. 

For  both  experiments  we  fix  the  number  of  states  in  the  chain  at  1000,  and  set  the  birth  (death) 
rate  to  be  50000  (40000).  We  choose  large  rates  to  prevent  numerical  underflow.  In  all  figures  we  include 
data  points  for  five  algorithms:  GS,  ML,  KMS-06,  KMS-09,  and  KMS-11,  where  KMS-xx  denotes  the 
KMS  algorithm  assuming  that  block  GS  is  applied  on  the  blocks  until  the  solution  does  not  improve 
by  more  than  le-xx  in  any  state.  We  also  ran  the  experiments  using  KMS-04,  not  shown,  and  the 
performance  is  very  close  to  that  of  KMS-06. 

In  the  first  experiment  we  determine  what  effect  the  number  of  states  per  block  has  on  the  perfor¬ 
mance  of  the  policies  while  keeping  the  problem  size  fixed.  This  is  done  by  setting  6  =  0.01  and  varying 
the  number  of  blocks  since  a  small  number  of  blocks  results  in  a  large  number  of  states  per  block.  For 
this  experiment  we  do  not  use  Gaussian  elimination  for  solving  any  of  the  blocks  in  KMS  since  this 
would  prohibit  us  from  attributing  results  to  a  single  factor.  If  we  used  Gaussian  elimination  for  small 
blocks  the  algorithm  would  change  as  the  number  of  blocks  changes,  thus  making  the  results  difficult 
to  interpret. 

In  figure  4  we  plot  the  results.  We  include  plots  for  the  number  of  iterations,  the  overall  floating 
point  operation  count  in  millions  of  floating  point  operations  (MFLOPS),  the  time  in  seconds,  and 
the  ratio  of  flops  relative  to  the  ML  algorithm.  The  time  metric  was  obtained  using  the  Unix  times 
system  call.  The  time  metric  results  in  similar  comparisons,  but  is  dependent  on  the  efficiency  of  the 
implementation  of  the  algorithms.  In  addition,  we  have  found  the  timings  to  fluctuate  do  to  interference 
form  other  jobs  on  the  workstation,  hence  we  focus  on  flops  and  iterations  in  subsequent  experiments. 

First  consider  the  number  of  iterations  needed.  The  behavior  of  KMS-11  is  exactly  what  we  would 
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expect.  KMS-11  requires  a  very  small  number  of  global  iterations  since  the  chain  is  strongly  NCD  and 
the  individual  blocks  are  solved  to  a  high  level  of  accuracy.  The  number  of  global  iterations  increases  as 
the  required  accuracy  of  solving  the  individual  blocks  decreases.  Note  that  the  ML  algorithm  requires 
more  iterations  than  KMS-11  but  less  than  the  other  algorithms,  and  that  GS  requires  a  large  number 
of  iterations. 

Now  consider  the  flops  metric.  The  ML  algorithm  requires  significantly  less  computation  than  any 
of  the  other  algorithms  and  the  ordering  of  the  KMS  policies  is  reversed  when  considering  flops  instead  of 
iterations.  The  KMS-11  algorithm  may  require  fewer  global  iterations  than  the  other  KMS  algorithms, 
but  each  iteration  requires  enough  additional  work  to  outweigh  the  reduced  number  of  iterations.  Note 
that  Stewart  [9,  10]  pointed  out  that  increasing  the  number  of  iterations  when  solving  each  block  may 
or  may  not  decrease  overall  solution  time.  In  our  experiments  we  find  as  a  general  rule  that  decreasing 
the  accuracy  required  of  each  individual  block  solve  decreases  the  overall  solution  time. 

The  performance  of  the  KMS  algorithm  decreases  as  the  block  size  increases  because  both  block 
solvers  considered  (GS  and  Gaussian  elimination)  have  a  superlinear  increase  in  operation  counts  in  the 
number  of  unknowns.  The  saving  per  block  outweighs  the  additional  number  of  blocks  now  needed  to 
be  solved.  The  blocks  used  in  this  experiment  were  only  of  very  modest  size,  <  500.  As  the  blocks 
increase  in  size  ML  becomes  faster  with  respect  to  GS  ot  Gaussian  elimination,  thus  the  superiority  of 
ML  will  increase  with  the  system  size. 

We  now  explore  the  sensitivity  of  the  algorithms  to  the  degree  of  decomposability  of  the  Markov 
chain  by  varying  the  parameter  6.  Our  implementation  of  the  KMS  algorithm  always  assumes  the  same 
block  structure  regardless  of  6,  hence  by  increasing  parameter  6  we  can  decrease  the  decomposability. 
We  plot  the  results  in  figure  5.  The  performance  of  the  KMS-11  and  KMS-9  algorithms  quickly  degrades 
as  S  increases.  The  is  not  unexpected  since  the  policy  is  designed  to  work  on  an  NCD  chain.  The  fact 
that  it  converges  for  high  values  of  6  is  encouraging.  Note  that  KMS-6  is  less  sensitive  to  6.  The 
ML  algorithm  is  almost  completely  insensitive  to  6.  This  too  is  not  particularly  surprising  since  we 
previously  showed  the  excellent  performance  of  the  ML  algorithm  for  non-NCD  chains  [2]. 

We  now  consider  the  less  trivial  Markov  chain  of  the  queueing  network  model  of  the  multipro- 
grammed  computer  system  described  in  section  2.2.  For  these  experiments  we  assume  all  KMS  algo¬ 
rithms  use  Gaussian  elimination  for  solving  blocks  with  fewer  than  100  states.  The  number  of  states 
per  block  is  presented  in  table  1.  For  a  population  of  20  there  are  21  blocks,  for  a  population  of  50  jobs 
there  are  51  blocks. 

In  figure  G  we  plot  the  results  of  the  algorithms  solving  this  model  for  a  population  of  20  when  the 
rate  from  the  cpu  to  the  terminal  is  varied  from  0.001  to  0.02.  Note  that  a  value  of  0.002  was  used  in  the 
experiments  found  in  [9].  Our  original  intention  in  varying  the  rate  from  the  cpu  to  the  terminals  was 
to  determine  how  a  change  in  the  decomposability  of  the  Markov  chain  affects  the  relative  performance 
of  the  algorithms.  The  general  trend  noted  from  the  number  of  flops  is  that  as  the  rate  is  increased, 
hence  the  decomposability  decreased,  the  KMS  algorithms  requires  significantly  less  computation.  This 
is  counter-intuitive.  The  reason  can  be  found  in  the  solution  values  of  the  Markov  chains.  In  figure  7 
we  plot  the  probability  mass  functions  for  the  solution  on  a  block  basis.  These  plots  were  obtained  by 
summing  the  probability  mass  over  all  states  within  each  of  the  21  blocks.  As  the  rate  of  going  from 
the  cpu  to  the  terminals  is  increased,  probability  mass  shifts  from  the  blocks  with  a  large  number  of 
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666 

703 
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1128 

1176 
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1326 

Table  1:  Sizes  of  the  51  Blocks  for  Example  2 

states  (corresponding  to  a  large  number  of  active  jobs)  to  the  blocks  with  a  small  number  of  states 
l  corresponding  to  a  small  number  of  active  jobs).  Hence,  the  problem  becomes  trivial  for  the  KMS 
algorithm  to  solve  since  the  blocks  with  a  large  number  of  states  require  only  one  GS  iteration.  Thus, 
we  find  the  performance  of  the  KMS  algorithm  is  very  dependent  on  the  solution  to  the  Markov  chain 
if  the  solution  results  in  blocks  whose  probability  mass  is  significantly  smaller  than  the  tolerance  used 
for  determining  the  stopping  point  of  solving  each  block. 

In  figure  8  we  plot  the  results  assuming  a  population  of  50  jobs.  At  a  cpu  to  terminal  rate  of 
0.001  most  of  the  probability  mass  in  the  solution  vector  is  in  the  last  few  blocks,  hence  KMS  performs 
significantly  worse  than  ML.  For  higher  rates  the  mass  shifts  to  the  blocks  with  a  small  number  of  states 
and  the  performance  of  the  KMS  policies  improves  as  in  figure  6. 


6  Conclusions 

Our  experimental  evidence  indicates  that  the  ML  algorithm  is  an  efficient  solver  for  NCD  Markov  chains. 
The  algorithm  is  also  insensitive  to  the  degree  of  decomposability  and  the  size  of  the  blocks  in  the  chain 
for  the  examples  considered.  These  results  strengthen  the  case  that  the  ML  algorithm  is  an  efficient 
general  purpose  solver. 

Our  results  also  indicate  that  for  the  problems  we  have  considered,  the  ML  algorithm  is  faster  than 
the  KMS  algorithm  in  its  proposed  form,  i.e.  using  GS  (Gaussian  elimination)  for  solving  blocks  when 
the  number  of  states  in  the  block  is  >  100  (<  100).  We  attribute  the  superior  performance  of  the  ML 
algorithm  to  the  fact  that  the  ML  algorithm  is  much  faster  than  GS  or  Gaussian  elimination  for  solving 
large  blocks. 

At  first  it  may  seem  surprising  that  the  ML  algorithm,  not  explicitly  designed  for  NCD  chains, 
should  work  so  well.  They  key  to  understanding  the  good  performance  of  the  ML  algorithm  is  the 
aggregation  strategy.  Since  the  algorithm  aggregates  strongly  connected  states  together  it  has  a  similar 
effect  as  the  IAD  algorithms.  In  fact,  for  some  small  examples  of  our  generalized  birth  death  chain  we 
have  verified  that  the  aggregated  chain  at  some  coarse  level  is  the  same  chain  found  at  the  coarse  level 
of  the  KMS  algorithm. 

A  secondary  contribution  of  this  work  is  the  discovery  that  the  solution  speed  of  the  KMS  algorithm 
(and  we  assume  all  traditional  iterative  aggregation/disaggregation  algorithms)  can  be  strongly  affected 
by  the  solution  values  of  the  Markov  chain.  Note  that  the  ML  algorithm  is  much  less  sensitive  to  the 
solution  values  of  the  Markov  chain. 
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The  KMS  algorithm  is  an  attractive  algorithm  since  it  easily  parallelizes  and  allows  the  efficient 
solution  of  problems  that  are  larger  than  memory  when  the  chain  is  NCD.  We  conclude  therefore  that  if 
KMS  is  to  be  used,  the  ML  algorithm  should  be  used  to  solve  the  individual  blocks  since  ML  is  a  better 
solver  for  large  blocks  than  GS,  Gaussian  elimination,  or  SOR.  We  plan  to  investigate  this  merger  of 
the  two  algorithms  in  the  near  future. 
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Figure  4:  NCD  Birth- Death  chain.  1000  states,  Number  of  blocks  varied.  Upper  Left:  MFLOPs;  Upper 
Right:  Number  of  Iterations;  Lower  Left:  Computation  time;  Lower  Right:  ratio  of  FLOPs  to  ML. 
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Figure  5:  NCD  Birth-Death  chain.  1000  states,  20  blocks,  S  varied.  Left:  MFLOPs;  Right:  Number  of 
Iterations. 


Figure  6:  Interactive  Computer  System  Model,  20  jobs,  processing  rate  varied.  Left:  MFLOPs;  Right 
Number  of  Iterations. 
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Figure  7:  Interactive  Computer  System  Model  20  Jobs,  selected  block  probabilities 


Figure  8:  Interactive  Computer  System  Model,  50  jobs,  processing  rate  varied.  Left:  MFLOPs;  Right: 
Number  of  Iterations. 
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